EDA DPF Failure

We want to find out what factors lead to failure.

idle_duration_mins: the total duration (minutes) the vehicle was idling for the day

dpf_regen_inhibited_duration_mins: the total duration (minutes) where dpf regen was inhibited for the day (regen cannot occur even if it needs to)

dpf_regen_not_active_duration_mins: the total duration (minutes) where dpf regen was not active for the day (regen not taking place)

dpf_regen_needed_duration_mins: the total duration (minutes) where dpf regen was reported as being needed for the day (regen needed but is not taking place)

dpf_regen_inhibit_switch_active_duration_mins: the total duration (minutes) where the driver accessible dpf inhibit switch was active for the day (regen inhibited by the driver)

Line Graphs

Platform IDs

Here we see that one platform ID may have different numbers of trucks.

Here, we see that the minimum number of trucks in any given platform is 48 - but honestly most trucks safely have 729.

However, it is nice to see that most platforms have 729 trucks exactly.

We have a total of 161 platforms in the dataset.

We want to make sure we have enough data before the failure happens - so no failure within the first 15 days.

We may have to aggregate the dates to make it less granular. We may need some windowing - take 15 rows at a time to feed it in TSFresh. For each batch of 15 rows, we are trying to make a prediction

For each of the 69 trucks that don't have DPF failure, how much time series data do you want to feed? This is where we want to do that windowing.

Wrangling

To have a standard number of time series windows - I drop the platform IDs that do not have the top number of vehicles in them. I just create this as a separate data.

Correlation